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Abstract. The distribution of overlaps of solutions of a random CSP is an indi- 
cator of the overall geometry of its solution space. For random fc-SAT, nonrigor- 
ous methods from Statistical Physics support the validity of the "one step replica 
symmetry breaking" approach. Some of these predictions were rigorously con- 
firmed in [MMZ05a| |MMZ05b|. There it is proved that the overlap distribution 
of random fc-SAT, fc > 9, has discontinuous support. Furthermore, Achlioptas 
and Ricci-Tersenghi | ART06| proved that, for random fc-SAT, fc > 8. and con- 
straint densities close enough to the phase transition: 

- there exists an exponential number of clusters of satisfying assignments. 

- the distance between satisfying assignments in different clusters is linear. 
We aim to understand the structural properties of random CSP that lead to so- 
lution clustering. To this end, we prove two results on the cluster structure of 
solutions for binary CSP under the random model from | Mol02 1 : 

1 . For all constraint sets S (described in ICD04IIst05l ) s.t. SAT(S) has a sharp 
threshold and all q g (0, 1], g-overlap-SAT^S) has a sharp threshold (i.e. 
the first step of the approach in [MMZ05a| works in all nontrivial cases). 

2. For any constraint density value c < 1, the set of solutions of a random in- 
stance of 2-SAT form, w.h.p., a single cluster. Also, for and any q £ (0, 1] 
such an instance has w.h.p. two satisfying assignment of overlap ~ q. Thus, 
as expected from Statistical Physics predictions, the second step of the ap- 
proach in IMMZ05al fails for 2-SAT. 

1 Introduction 

A great deal of insight in the complexity of random constraint satisfaction problems 
has come from studying phase transitions [MZ97]. Concepts from Statistical Physics, 
such as first-order phase transitions, backbones, or replica symmetry breaking have 
helped to refine (and understand the limitations of) the empirical observation that the 
"hardest" instances are located at the transition point. In some cases the connection pre- 
dicted by Statistical Physics can be made explicit in purely combinatorial terms. For 
instance Monasson et al. llMZK + 9 9b MZK + 99al have suggested that first-order phase 
transitions are correlated with exponential complexity of Davis-Putnam algorithms on 
random unsatisfiable instances at the phase transition. This has been rigorously con- 
firmed to a certain extent 1AKKK01 IABM04IIPB 05 1 . As for instances in the satisfiable 
phase, much of the intuition on the complexity of such instances comes again from 
Statistical Physics, via the so-called one step replica symmetry breaking (1-RSB) ap- 
proach. The 1-RSB approach provides predictions on the geometric structure of the set 



of satisfying assignments of a random formula on n variables. The set of such assign- 
ments can be naturally viewed as a subgraph of the hypercube of dimension n, where 
two satisfying assignments are neighbors if they only differ in the value of one variable. 
Physics considerations imply that for small values of the constraint density c the set of 
satisfying assignments forms a single cluster. The distribution of overlaps is peaked 
around a certain constant value. The range of possible overlaps (even those that are ex- 
ponentially infrequent) is a continuous interval. In the presence of 1-RSB, for constraint 
density values higher than a critical value crsb (smaller than the unsatisfiability thresh- 
old cum sat) the set of satisfying assignments splits into several clusters such that: (i) 
assignments in the same cluster all agree on a set of variables having linear size. The 
distribution of overlaps of assignments in the same cluster is still concentrated around 
a constant; (ii) assignments in different clusters differ in D(n) variables; (iii) the distri- 
bution of all overlaps has discontinuous support (see Fig.[T](a); note that recent studies 
| KMRSZ07 1 suggest the existence of further phases below crs b , omitted for simplicity 
from discussion and the figure). The geometry of satisfying assignments outlined above 
has implications for the complexity of heuristics such as local search, algorithms such 
as belief propagation, or Davis-Putnam. The 1-RSB approach provides (nonrigorous) 
values for the location of the phase transition in random fc-SAT [MMZ06| that seems 
to match the experimental evidence. Algorithms that take advantage of the geometry of 
solution space predicted by 1-RSB (e.g. the celebrated survey propagation algorithm 
[BMZ05 1) have greatly extended the range of instances that can be solved in practice. 
Rigorous results on the cluster structure of solutions of random CSP are emerging: 
Mezard et al. [MMZ05a| have developed an ingenious method for proving that the dis- 
tribution of overlap values of random fc-SAT, with k > 9 indeed has discontinuous 
support. Their approach is based on the following concepts: 

Definition 1. The overlap of two assignments A and B for a formula <P on n variables, 
denoted by overlap(A, B), is the fraction of variables on which the two assignments 
agree (this is similar to [MMZ05a] and linearly related to the notion of overlap from 
the statistical physics literature, where truth values are modeled by +1 and — 1, instead 
of 0/1). Formally overlap(A, B) = KM(M=g(gi)}l , 

The distribution of overlaps is, indeed, the original order parameter that was origi- 
nally used to study the phase transition in random fc-SAT |MZ97|. 

Definition 2. g-overlap-fc-SAT: Given a k-CNF formula <P on n variables, decide 
whether <P has two satisfying assignments A and B such that overlap(A, B) G [q — 
l/^/n,q + l/^/n] (following the suggestion in [MMZ05a], we will use the function 
l/y/nfor the the width of the possible overlap around q; as discussed there, similar 
results are obtained with any "reasonably large" function b(n) = o(n)). We will refer 
to this event as A and B have overlap approximately equal to q. 

For every value of q, the probability that a random fc-SAT formula has two assign- 
ments with overlap ~ q is monotonically decreasing with constraint density, and is 
empirically changing from 1 to around a critical value Ck, q of the constraint density. 
If one can show that the function W : q — > Ck. q is not monotonic then there exists a 
critical value c* such that the horizontal line at c* will intersect the graph of function 




Fig. 1. (a) Structure of solution space according to 1RSB predictions. (b) Graphical description 
of the method used in | MMZ05a | to prove the discontinuity of support of the overlap distribution. 

W at multiple points. Therefore (Figure^(b)j the distribution of overlaps in a random 
k- SAT formula of constraint density c* has discontinuous support (these results were 
further extended, for fc-SAT, k > 9, by Achlioptas and Ricci-Tersenghi [ART06|). 

Our ultimate goal is to obtain an understanding of the underlying reasons for the 
emergence of clustering in random CSP, with an attempt at a precise classification. 
We investigate the nature of overlap distributions of CSP under the random model de- 
fined and investigated by Molloy |Mol02|. We cannot obtain a complete classification 
(whether the results in |MMZ05a ART06] extend to random 3-SAT is a more subtle 
problem; see [MS07 1). Instead, we prove two partial results: Theorem [T] shows that 
the first step of Mezard's approach can be applied to all random CSP problems with a 
sharp threshold. In contrast, in Theorem [2] we show that satisfying assignments of ran- 
dom instances of 2-SAT in the satisfiable phase form a single cluster, and can yield all 
possible values of the overlap. This confirms the prediction [MZ97 1 that the solutions 
space of 2-SAT has a different nature, describes by the so-called "replica symmetric" 
approach. The two results above are also naturally related to results of Gopalan et al. 
|GKMP06|. They proved a dichotomy theorem for the complexity of deciding whether 
the set of satisfying assignments of a CSP is connected (under the usual notion of ad- 
jacent assignments). One ingredient of the result is a restriction (called tightness) on 
the nature of constraints involved. Theorem 2 provides a natural examples of CSP with 
tight constraints for which there is evidence (the continuity of the overlap distribution) 
that symmetry breaking does not take place. It also shows that, to be really meaningful, 
the definition of adjacent assignments from |GKMP06| should be somewhat modified. 

2 Preliminaries 

Throughout the paper we will assume familiarity with the general concepts of phase 
transitions in combinatorial problems (see e.g. |MMZ01 1) and random structures. One 
paper whose concepts and methods we use in detail (and we assume greater familiar- 
ity with) is [Fri99 |. Consider a monotonically increasing problem A — (A n ) under the 
constant probability model r(n,p). For e > let p t = p e (n) define the canonical prob- 
ability such that 'P^xer{n, Pc (n))[ x € A] = e. The probability that a random sample x 



satisfies property A (i.e. x G A) is a monotonically increasing function of p. Problem A 
has a sharp threshold iff for every < e < 1/2, we have lim n ^ 00 Px -^l^f n) = 0. 
A has a coarse threshold if for some e > it holds that lim .^^ pi-<;(")-p*(") > q 
Related definitions can be given for the other two models for generating random struc- 
tures, the counting model and the multiset model [Bol85 1. Under reasonable conditions 
|Bol85| these models are equivalent, and we will liberally switch between them. In 
particular, for satisfiability problem A, and an instance <P of A, ca^) will denote its 
constraint density, the ratio between the number of clauses and the number of variables 
of <P. To specify the random model in this latter cases we have to specify the constraint 
density as a function of n, the number of variables. We will use ca to denote the value 
of the constraint density ca ($) (in the counting/multiset models) corresponding to tak- 
ing p — P1/2 in the constant probability model, ca is a function on n that is believed to 
tend to a constant as n — > oo. However, Friedgut's proof [Fri99 1 of a sharp threshold in 
fc-SAT (and our results) leave this issue open. 

Definition 3. Let T> = {0, 1, . . . , t — 1}, t > 2 be a fixed set. Consider the set of 
all 2* — 1 potential nonempty binary constraints on k variables X\ , . . . , X^. We fix 
a set of constraints C and define the random model CSP(C). A random formula from 
CSP n ^p (C) is specified by the following procedure: ( i) n is the number of variables; ( ii) 
for each k-tuple of ordered distinct variables (x\, . . . , x^) and each C S C add con- 
straint C(x\, . . . , Xk) independently with probability p. We will write SAT(C) instead 
of C SP(C) for boolean constraint satisfaction problems (i.e. t — 2). 

Definition 4. Let T> = {0, 1, . . . , t — 1}, t > 2 be a fixed set. Let q be a real number 
in the range [0,1]. The problem q-overlap-C S P(C) is the decision problem specified 
as follows: (i) The input is an instance <S> of C SP n ^ p (C); (ii) The decision problem 
is whether <P has two satisfying assignments A, B such that overlap(A 1 B) G [q — 
\j\fn,q + l/^/n\ (following [MMZ05b], we will informally refer to the property as 
is q-satisfiable"). The random model for q-overlap-C S P(C) is simply the one for 
CSP n _ p (C). We will refer to this class of problems as fixed-overlap CSP. 

The notion of adjacent satisfying assignments used in |ART06|, while adequate for 
random fc-SAT, is not suited for other random CSP. For instance, it is impossible to 
flip exactly one bit in a satisfying assignment of an instance of 1-in-fc SAT | ACIM01 1 
and still obtain a satisfying assignment (except for the case when that variable does 
not appear in the formula). Thus we will use the following setup: let f(n) — o(n) be 
a suitably large function; we will assume that lim/(n)/logn = oo. Two satisfying 
assignments that differ on at most f(n) variables will be called adjacent. A cluster is a 
connected component of the set of satisfying assignments. 

3 Results 

In this section we study the sharpness of the threshold for random generalized con- 
straint satisfaction problem defined by Molloy [Mol02]. Creignou and Daude [CD04| 
(and independently the author of this paper |Ist05 1) have characterized the boolean CSP 
problems SAT(C) with a sharp threshold: 



Definition 5. A set of constraints C is interesting if there exist constraints Cq,C\ G C 
with Cq(0) — Ci(l) = 0, where 0, 1 are the "all zeros" ("all ones") assignments. 
Constraint C2 is an implicate of C\ iff every satisfying assignment for C\ satisfies CV 
A boolean constraint C strongly depends on a literal if it has an unit clause as an 
implicate. A boolean constraint C strongly depends on a 2-XOR relation 3i,j 6 
{1, . . . , k} such that constraint "xi ^ Xj " is an implicate of C. 

Proposition 1. [CD04,Ist05] Consider a generalized satisfiability problem SAT(C) 
with C interesting, (i) If some constraint in C strongly depends on one literal then 
SAT(C) has a coarse threshold; (ii) If some constraint in C strongly depends on a 
2XOR-relation then SAT(C) lias a coarse threshold; (Hi) In all other cases SAT(C) 
has a sharp threshold. 

Mora et. al [MMZ05b| proved that all problems q-overlap-fc-SAT, k > 2 have a 
sharp threshold. We extend this result by showing that for all CSP with a sharp thresh- 
old, their fixed-overlap versions also have a sharp threshold: 

Theorem 1. Consider a generalized satisfiability problem SAT(C) such that (i) C is 
interesting ( ii) No constraint in C strongly depends on a literal; ( Hi) No constraint in 
C strongly depends on a 2XOR- relation. Then for all values q £ (0, 1] the problem 
q-overlap-SAT(C) has a sharp threshold. 

The previous result does not yet rigorously prove the existence of curve W since it 
does not prove fact that the phase transition in the g-overlap versions happens at some 
constant constraint density c q . 

Given the previous result, how can a problem SAT(C) have an overlap distribution 
with continuous support ? Obviously, the second step of the approach in |MMZ05a| 
must fail. This happens when the location c q of the transition for the q-overlap version 
of SAT(C) is a monotonic function of the overlap q. The next result shows gives a 
natural problem for which this is indeed the case: 

Theorem 2. The following are true: 

(i) Let c < 1. Then with probability 1 — o(l) the satisfying assignments of a random 
instance of 2-SAT of constraint density cform a single cluster. 

(ii) Also, let q £ (0, 1]. Let c < 1. Then with probability 1 — o(l) a random instance of 
2-SAT of constraint density c is q-satisfiable. 

4 Proof of Theorem [1] 

Before presenting the proof, let us remark that for boolean constraints, the hypothesis 
of the Theorem[T]implies that the set of constraints C is well-behaved. That is[Mol02], 
every formula whose hypergraph is tree-like or unicyclic is satisfiable. This is, for in- 
stance, an easy consequence of conditions (D0),(D1), Theorem 4.1 in |CD04|. Also, 
since C is interesting there exist constraints To, r± G C such that -To(xi, . . . , Xk) \= 
xj V . . . V x^and fifa, ...,Xk) (= x% V . . . V Xk- 



We will employ the Friedgut-Bourgain criterion for the existence of a sharp thresh- 
old of a monotonic property A. Note that any problem g-overlap-SAT(C) is indeed 
monotone, since adding clauses can only reduce the set of satisfying assignments, in 
particular decreasing the probability of ^-satisfiability. The starting point of all applica- 
tions of the Friedgut-Bourgain criterion is noting that if a monotone property A has a 
coarse threshold then there exists < e < 1/2, p* = p*(n) £ [pi_ e ,p e ] and C > 
such that p ■ dtJ, ^p A ^ |p= p » (n) < C. Bourgain and Friedgut have shown that the following 
holds: 

Proposition 2. Suppose p = o(l) is such thatp- d,ip J^ \ p=p * („) < C. Then there is 8 = 
8(C) > such that either fi p (x £ {0, l} n | x contains x' £ A of size \x'\ < 10C} > 8, 
or there exists x' tfL A of size \x'\ < 10C such that fj, p (x £ A\x D x') > [i p (A) + 8. 

(in fact, in [Fri99 1 the proposition is stated assuming for convenience that p = P1/2, 
but this is not needed. We give here the general statement). We will need, in fact, an 
enhancement to the Bourgain-Friedgut result that was given by Friedgut in [Fri05]: For 
a finite set of words W define the filter generated by W, F(W) as F(W) = {x | (By £ 
W) with x D y}. Friedgut noted ([Fri05j, remarks on pages 5-6 of that paper) that the 
set W of "booster" sets x' in the second conditions satisfies /i p (F(W)) = ^7(1). 

Consider now a set of constraints C satisfying the conditions the Theorem, and let 
A = q-overlap-SAT(C). Applying Proposition pi enhanced by the previous observa- 
tion, and taking into account the fact that the number of isomorphism types of formulas 
of size at most 10C is finite, we infer that we can assume that formula x' in the sec- 
ond condition appears with probability as a subformula in a random formula in 
q-overlap-SAT p (C). Furthermore, instead on conditioning on the presence of x' as a 
subset of x one can, instead, add it. Finally, note that for random constraint satisfac- 
tion problems, because of the invariance of such problems under variable renaming, 
one only needs to add a random copy of x' . Putting all these observations together, the 
following version of Proposition |2]holds: 

Proposition 3. Suppose p = o(l) is such that p ■ ^j~~\ P = P *(n) < C. Then there is 
8 = 8(C) > such that either 

[i p (x £ {0, 1}"| x contains x' £ A of size \x'\ < 10C} > 8 (1) 

or there exists F ^ A of size \F\ < 10C, such that 

— Formula F appears with probability f2(l) as a subformula in a random formula in 
CSP P (C). 

- If ^ denotes the formula obtained by creating a copy of x' on a random set of 
variables, then 

Li p (xU~£A)> f i p (A)+S. (2) 

To show that random q-overlap-SAT(C) has a sharp threshold, we will reason by 
contradiction. Assuming this is not the case, one needs to prove that the two conditions 
in Proposition [3] do not hold. 



Suppose, indeed, that condition ([T} was true. That is, with positive probability it is 
true that a random formula <P 6 CSP(C) contains some subformula 6 q-overlap-SAT(C) 
of size at most IOC. With high probability all subformulas of a random formula <P of 
size at most IOC are either tree-like or unicyclic. But because the set of constraints C 
is well-behaved (this is the point where the hypothesis on the constraint set C is used), 
all formulas in CSP(C) that are tree-like or unicyclic are satisfiable. Since the formula 
contains a finite number of variables, one can set the other variables not appearing in 
<P in a way that will create two satisfying assignments with overlap approximately q. 
Therefore the first condition in Proposition[3]cannot be true. 

Assume, now, that condition ^ is true. That is, there exists F £ g-overlap-SAT(C), 
a formula of size at most IOC, such that adding F to a random formula <P £ CSP p (C) 
diminishes the probability that the resulting formula has two assignments of overlap 
~ q by at least a constant S. As discussed, we assume that F occurs with probability 
0(1) in a random formula in CSP p (C). Therefore F is tree-like or unicyclic. 

Definition 6. A unit clause is a constraint (not necessarily part of the constraint set C) 
specified by a condition X = 5, with X being a variable and S G {0, 1}. 

Lemma 1. If F satisfies condition ^ then there exists another formula G that is 
specified by a finite conjunction of unit clauses G = (X\ = 5i) A . . . A ( X p = 5 P ), that 
also satisfies condition 

Proof. Formula F appears with constant probability in a random CSP(C) formula 
with probability p and has constant size. Therefore F is either tree-like or unicyclic. 
The result follows easily by replacing F with formula G consisting of the conjunction 
of unit constraints corresponding to a satisfying assignment of F. Indeed, G is tighter 
than F, so adding a random copy of G instead of a random copy of F can only increase 
the probability that the resulting formula is unsatisfiable. □ 

The key to refuting condition (j2j) is to show that, if it did hold then, for every 
monotonically increasing function f(n) that tends to infinity, we could also increase 
the probability of unsatisfiability by a positive constant if, instead of conditioning on x 
containing a copy of F, we add f(n) random constraints from set C. We first prove: 

Lemma 2. Let < t < 1 be a constant and let p be such that (J, p (q — overlap — 
SAT(C)) > r. Assume that r > 1 and that g±, g 2 , ■ ■ ■ g r are elements of {0. 1} such 
that, when (X±, Xi, ■ ■ ■ , X r ) is a random r-tuple of different variables 

T 

Pr(<P has sat. assign. A, B of overlap ~ q with X\ = gx, . . . , X r — g r ) < — . (3) 

Then there exists constant m > 1 (that only depends on k,r,r) such that, if rj 
denotes a formula from CSP(C) obtained by adding, for each x G {0, 1}, m ■ r ■ 2 k 
random copies of P x , then 

Pr(<P U T] S q-overlap-SAT(C)) < - (4) 

Proof. 

For i £ {1, . . . , r} define A, to be the event that the formula <P has a pair of sat- 
isfying assignments of overlap ~ q with X% = g±,...,Xi = gi. Also define Aq to 



be the event that <P E q-overlap-SAT(C). The hypothesis translates as the fact that 
both inequalities Pr(A ) > r and Pr(A r ) < | are true. Therefore Pr(A r \A ) = 

Pr £(A A ) 0) ^ TJ r = l- Since A we have 



Hr ■= Pr[A r \A ] = Pr[A r ^\Ao} + Pr[A r \A r ^ A A ] ■ Pr[A r ^\A } > - (5) 

But Pr[AJA,,_i A Ao] = Pr[A r \A r -i] is the fraction of variables in formula <P A 
(Xi = pi) A ... A (X r -i — ,g r -i) that have to receive values different from g r in order 
for the resulting formula to still have two satisfying assignments of overlap ~ q; let C r 
be the set of such variables. If instead of the last unit constraint we add a random copy 
of constraint r gr , the resulting formula is in q-overlap-SAT(C) when all the variables 
appearing in the new constraint are in the set C r . Denoting A,. = Pr[A r |A r _i], the 
probability of this last event happening is Aj?/(1 — o( 1 ) ) (we choose a fc-tuple of distinct 
variables from a set of density A r ); Thus the probability that the new formula is in 

q-overlap-SAT(C) is at least v T := Pr[A r _!|A ] + pz^n ' ^M-A-i I A]- Applying 
Jensen's inequality to the convex function f(x) — x k and using inequality §5§, we infer 

^ < Mr = (Pr[Ar-M • 1 + Pr[A r \A^x] ■ Pr[A^|A]) fe < 

< Pr[A r ^\M ■ \ k + Pr[A r \A^\ k ■ Pr[A^\A^] = 
= (Pr[A-i|A] + X k r ■ PrlA^lA)}) = v r • (1 + o(l)). 

Thus v r > i • (1 — o(l)). The conclusion of this long argument is that adding one 
random copy of Pb r instead of the r-th constraint lowers the probability of membership 
to q-overlap-SAT(C) to no less than A • (1 — o(l)). Adding the copy of the constraint 
before the first r — 1 unit constraints and repeating the argument recursively implies 
the fact that, if instead of adding the r unit constraints to <P we add r random copies 
of , . . • , Ib r that the resulting formula belongs to q-overlap-SAT(C), given that # £ 
g-overlap-SAT(C), is at least 7,, = 2 fcr (i-o(i)) • Since tne va l ues bi, . . . ,b r can repeat 
themselves, the same is true if we add r random copies of r x for every x. 

Suppose now that we add r-m-2 k copies of each r x (that is, we repeat the random 
experiment m ■ 2 k times, for some integer m > 1). The probability that none of the 
experiments will make the resulting formula unsatisfiable is at most (1 — r y r ) m ' 2k ■ 
For some constant m this is going to be at most 1 — J. This means that Pr(<P U 
77 is satisfiable) < §■ □ 
We can refute condition Q directly, thus obtaining a contradiction. To do so, we em- 
ploy the following result (Lemma 3.1 in |AF99|): 

Lemma 3. For a monotone property^A let — Pr[G G r(n,p) has property A], 
and let (j, + (p, M) = Pr[G x U G 2 \ G x € r(n,p),G 2 € r(n, M) has property A]. 

Let A — A(n) C {0, 1}™ be a monotone property and M — M(n) such that 
M = o{^/np). Then \p,(p) - p+(p, M) \ = o(l). 

1 Achlioptas and Friedgut assume A to be a monotone graph property, but this fact is not used 
anywhere in their proof. 



We obtain a contradiction in the following way: consider a random formula r) with 
f(n) clauses, for some f(n) — ► oo. It is easy to show that the probability that r\ contains, 
for some x, less than r ■ m ■ 2 k copies of F x (with r, m as in Lemma 2) is o(l). So 
adding 77 (instead of the random formula in Lemma [2]) decreases the probability of q- 
satisfiability by at least S — o(l). But this contradicts the conclusion of Lemma[3] □ 



5 Proof of Theorem H 

We will use the well-known graph-theoretic interpretation of 2-CNF formulas, that as- 
sociates to a given formula <P on n variables a directed graph G<p with 2n vertices 
{x\, . . . 7 x n ,x~i, . . . , x^}, and for every clause C — a V (3 of it adds directed edges 
a — > (3 and (3 — ► a to We will need a number of results from |RPF99| concerning 
the structure of graph G$ when <P is a random formula of constraint density c < 1. 

Definition 7. A cycle is a set li — ► Z2 , ^2 - ► ^3, ■ • ■ ,h h of directed edges. Two 
cycles C\ , C2 are overlapping they share at least an edge. Two cycles C\ , Ci are 
connected by a path there exist vertices x £ C\,y £ Ci and a path (possibly of 
length zero, i.e. x — y) from x to y. 

Lemma 4. Let t = t(n) such that 1 = o{t). Let <P be a random 2-CNF formula of 
constraint density c < 1 and G$ be its associated digraph. With probability 1 — o(l) 
the following are true: (i) G$ contains no cycles connected by a path, (ii) G$ contains 
no overlapping cycles, (ii) the sum of all the cycle lengths is less than t. 

To these results we add the following claim (whose proof is similar to that of Claim|4] 
(i) from |RPF99|): With probability 1 — o(\) no literal implies literals in two different 
cycles. 

We can thus divide the literals of the formula into four classes: (i) those that are on 
a cycle, (ii) those that are not on a cycle, but imply a literal on a cycle, (iii) those that are 
not on a cycle, but are implied by a literal on a cycle, (iv) those that are not on a cycle 
and neither imply nor are implied by a literal on a cycle. 

Definition 8. A literal x is bad if there exists y such that x —> y, x —> y. 

We first claim that there is a function h(n) = o(n) such that with probability 1— o(l) 
the number of bad literals is at most h(n). Indeed, all bad literals can only be set to false 
in any satisfying assignment of the formula. This means that a bad literal belongs to the 
spine of the formula llBBC + 0ll . But a standard argument (see e.g. |IPB05 1) shows that 
the size of the spine is o[n) . 

Bad literals (and their negations) are assigned fixed values in all satisfying assign- 
ments. This property guarantees that such literals do not influence the value of the over- 
lap between any two satisfying assignments. Let B be the set of such literals. 
Theorem 2(i): Let A and B be two satisfying assignments of a formula <P, such that 
d{A, B) > logn (i.e. A and B are not adjacent). We will prove the following result: 

Lemma 5. There exists a satisfying assignment C such that d(A, C) = O(logn) and 
d{C, B) < d(A, B). That is, C is adjacent to A and closer to B than A. 



An iterative application of the lemma proves the Theorem 2(i). 
Proof: Let £ be a variable such that A(x) ^ B(x) and x is implication minimal with 
this property. In other words if y ^ x and y — ► x then A(y) = B(y). 
Case 1: A(x) = and B(x) = 1. Then B(z) = 1 for all z such that x A z. Define 
the assignment C by C(z) = 1 if x — ► z, C(z) = A(z) otherwise. It is clear that 
d(C, B) < d(A, B), since C coincides with B on all bits whose value changes. To show 
that C is a satisfying assignment, suppose C did not satisfy some clause W = (a V f3). 
Then one of the following is true. 

(1) : both a and (3 are negations of literals implied by x. This leads to a contradiction, 

since it would imply that B does not satisfy clause a V j3 either. 

(2) : one of them (say a) is the negation of a literal implied by x. Since x — > a and 

a — > /3, it follows that C(/9) = 1, so C satisfies clause W. 

(3) : none of them is the negation of a literal implied by x. Then C(a) — A(a) and 

C((3) — A((3), a contradiction, since A satisfies clause W . 

Case 2: A(x) = 1 and B{x) = 0. Then B(z) = for all z such that z A x . Define 
the assignment C by C(z) = if z A x, C(z) = otherwise. It is clear that 

g?(C, £?) < g?(A, B), since C coincides with B on all the bits that change value, one of 
which is x. To show that C is a satisfying assignment, suppose C did not satisfy some 
clause a V (3. Then one of the following cases must hold 

(1) : both a and (3 are literals that imply x. This leads to a contradiction, since this 

would mean that B with respect to satisfying clause a V (3. 

(2) : one of them (say a) implies x. Since (3 — > a, it follows that /3 — > x, therefore (3 is 

assigned the value TRUE by C, a contradiction. 

(3) : none of a, (3 implies x. Then C and A coincide with respect to the values they 

give to a, (3, a contradiction, since A satisfies clause W . 

Theorem 2(ii): We directly construct two satisfying assignments A and B of over- 
lap qn ± y/n. We will work with a directed weighted graph G2 obtained from G4, by 
contracting every cycle to a node and assigning this node a weight equal to twice the 
size of the contracted cycle. Gi is well-defined when cycles in G^, do not intersect, 
an event that happens (cf. Claim |4| with probability 1 — o(l). All literals on a cycle 
of G<p need, of course, to be given the same value in any satisfying assignment. Since 
we have contracted all cycles in G<z>, Gi is a directed acyclic graph. The set of nodes 
corresponding to bad literals is downward closed, because if x — > y and y is bad then x 
is bad. Correspondingly, the set of nodes corresponding to negations of a bad literal is 
upward closed. 

We begin by defining a set S of nodes of Gi that will ultimately contain half of 
the nodes in Gi- Nodes not chosen in S will be referred to as eliminated). In parallel 
we build a partial assignment by assigning those literals corresponding to eliminated 
nodes the unique values that are consistent with the satisfiability of the formula. Set S 
is recursively specified as follows: (i) start by defining V to be the set of all nodes in 
G2 (ii) add all nodes of of indegree in V to S and eliminate all nodes of outdegree 0. 
Set V to be the set of remaining nodes (not added to S or eliminated), (ii) continue this 
process as long as V ^ 0. 



It is easy to see that the set of literals corresponding to nodes in S contains, for every 
variable x, exactly one of x and x. Indeed, one cannot add both x and x to S in one 
step, otherwise the pure literal implying both would be bad. But then, when adding one 
of them we immediately eliminate the other one. On the other hand, we only eliminate 
a literal when its opposite has been retained in S. 

The first assignment, A simply corresponds to setting all literals corresponding to 
nodes in S to TRUE. We define the second assignment iteratively by the following 
process: (i) in Stage 1 choose a node of indegree zero, assign its associated variable 
the value FALSE and eliminate the node from S. If the eliminated node corresponds to 
a cycle in G2 all variables in the cycle are set to FALSE, (ii) when a remaining node 
becomes of indegree zero as a result of eliminations, it is labeled by the value of the 
stage that led to this happening (nodes that originally had indegree zero are labelled 0). 
(iii) the literal chosen to set to FALSE is among those with a smallest stage number, (iv) 
continue the process until the number of variables assigned FALSE is in the interval 
[qn — y/n, qn + y/n]. This is possibly if the sum of all cycle lengths in the formula 
graph of <P is o(y/n), which happens (cf. Lemma[4]) with probability 1 — o(l). (v) The 
remaining literals in S are set to TRUE. 

Because bad literals are assigned identical values in both A and B it is easy to see 
that overlap(A, B) G [qn — y/n, qn + y/n\. We complete the proof of Theoremplby: 




Lemma 6. A and B are satisfying assignments for <S>. 

Proof: Suppose there exists a clause C = (x V y) = (x — > y) of <P that is not 
satisfied by A. Then x is given a TRUE value and y is given a FALSE value. Thus 
either a; is a bad literal, or x is in S. Also, either y is a bad literal or y is in S. Suppose y 
were a bad literal. Then, since x — > y, x is also bad. But this contradicts the two possible 
alternatives (x is a bad literal or x is in S). Suppose now y is in S. Then C = (y — ► x). 
Therefore, either x € S or x is among the literals (bad literals and their negations) 
eliminated before defining S. The first alternative leads to a contradiction with the two 
possible alternatives (x is a bad literal or x is in S), so it must be that a: is a bad literal. 
But then y is also bad, contradicting the assumption that y is in S. 

A similar argument shows that B is a satisfying assignment. Indeed, suppose there 
existed a clause C — (x V y) = (x — > y) of <P not satisfied by B. Then B(x) = 
TRU E, B(y) = FALSE. The choices compatible with this setup are: (i) x is in S 
and B{x) = TRUE, or x is bad. (ii) y is bad, or y is in S and B{y) = FALSE, or 
y e S and B(y) = TRUE, i.e. B(y) = FALSE. First, if y were bad then so would 
be x, contradicting all possible choices in (i). If x, y were both in S, with y assigned 
FALSE, since by construction of B the set of literals in S is downward closed under 
implication it follows that x would also be assigned FALSE, a contradiction. The other 
other possibility is that x is bad. But since y — * x that would mean that y is bad, a 
contradiction with the assumption that y 6 S. Finally, assume y is in S and is assigned 
TRUE. Since y — ► x either x E S or x is a bad literal. In the first case, since the set of 
literals assigned to TRUE is upward closed under implication it would mean that x is 
assigned TRUE by B, i.e. x is assigned FALSE, a contradiction. Suppose now that x is 
bad. Then B(x) — 0, a contradiction. □ 
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